Skip to content

[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode#55939

Draft
gengliangwang wants to merge 4 commits into
apache:masterfrom
gengliangwang:SPARK-56914-decimal-arithmetic
Draft

[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode#55939
gengliangwang wants to merge 4 commits into
apache:masterfrom
gengliangwang:SPARK-56914-decimal-arithmetic

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

Title: [SPARK-56914][SQL] Refactor decimal arithmetic codegen under ANSI mode
Base: master (stacked on PR 3 - SPARK-56911)
Head: gengliangwang:SPARK-56914-decimal-arithmetic

What changes were proposed in this pull request?

Use CastUtils.changePrecisionExact / changePrecisionOrNull (added in SPARK-56911) from the DecimalType.Fixed branches of:

  • BinaryArithmetic.doGenCode (covers Add / Subtract / Multiply on Decimal).
  • BinaryDivModLike.doGenCode (covers Divide / IntegralDivide / Remainder / Pmod on Decimal).
  • BinaryArithmetic.checkDecimalOverflow (eval path used by both groups via numeric.plus/minus/times/div).

Each call site goes from eval1.$op(eval2).toPrecision(p, s, ROUND_HALF_UP, !failOnError, ctx) + a 4-line null check to a single CastUtils.changePrecision{Exact,OrNull} call.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Decimal arithmetic is widespread in TPC-DS plans, and the BinaryArithmetic Decimal branch was one of the longer ANSI codegen bodies still emitted inline.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

build/sbt "catalyst/testOnly *ArithmeticExpressionSuite *DecimalSuite"

60/60 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

@gengliangwang
Copy link
Copy Markdown
Member Author


Stack overview (SPARK-56908 umbrella)

This PR is part of a stack of 8 PRs against SPARK-56908. Order:

  1. [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934 — [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode (this stack base)
  2. [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode #55935 — [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode
  3. [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936 — [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode
  4. [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode #55937 — [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode
  5. [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode #55939 — [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode (depends on [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936)
  6. [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode #55938 — [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode (independent)
  7. [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940 — [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode (independent)
  8. [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode #55941 — [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode (independent)

PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses CastUtils.changePrecisionExact. PRs 6, 7, 8 branch off master independently.

### What changes were proposed in this pull request?

Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the
multi-line ANSI overflow-check codegen for casts that target `int` and
`long` into one-line static-method calls. Source and target `DataType`
constants used in the overflow error message live as `private static
final` fields on the helper class, so the happy path performs no per-row
`references[]` lookups.

Helpers added:
* `longToIntExact(long)` for narrowing `long -> int`.
* `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional
  -> int.
* `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional
  -> long.

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` and
  `castFractionToIntegralTypeCode` dispatch on the target type: `int`
  (and `long` for the fraction case) emit a `CastUtils.<...>Exact` call;
  byte/short targets keep the inline body (refactored in SPARK-56910).
* Eval paths for `castToInt` add ANSI `LongType` / `FloatType` /
  `DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType`
  cases, both delegating to the new helpers.

### Why are the changes needed?

Part of SPARK-56908. The current ANSI cast codegen emits 5-line inline
overflow blocks per call site. Multiplied across the many cast paths in
a TPC-DS plan, this contributes meaningfully to the generated source size
and to Janino compile time, and pushes whole-stage methods closer to the
64KB JVM method limit.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source
text changes.

### How was this patch tested?

`build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite
*CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite
*ExpressionClassIdentitySuite"` — 312/312 pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x
Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast
targets and use them from `Cast.scala`. Drops the byte/short-target
dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added
in SPARK-56909 -- after this PR, all integral and fractional narrowing
ANSI casts share the same `CastUtils.<...>Exact` one-line codegen.

Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode`
  no longer dispatch on target type -- the helper-name pattern
  `${integralPrefix(from)}To${target.capitalize}Exact` covers all four
  target types.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for
  `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType`
  source types that delegate to the new helpers; the existing
  `exactNumeric.toInt(b) + bounds-check` fallback now only handles the
  remaining `Decimal` source.

Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies
were 5 lines each across 8 call sites; this PR collapses them to one
line per call site, matching the int/long target work from SPARK-56909.

No. The compiled behavior is identical; only the emitted Java source
text changes.

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \
  *ExpressionClassIdentitySuite"
```

312/312 pass.

Generated-by: Cursor 1.x
### What changes were proposed in this pull request?

Extend `CastUtils.java` with two helpers for decimal precision adjustment
and use them from `Cast.changePrecision` (both the eval and codegen
implementations). The new helpers mutate the input `Decimal` in place
(matching the behavior of the existing inline codegen), so they're safe
to call on the temporary produced by `Decimal.fromString(...)` /
`Decimal.apply(...)` / decimal-arithmetic results.

Helpers added:
* `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on
  overflow, preserves the per-call-site `QueryContext` so error messages
  keep their query-origin info.
* `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null`
  on overflow (no `QueryContext` needed).

`Cast.scala` changes:
* `changePrecision` eval method dispatches on `nullOnOverflow` and
  delegates to the appropriate helper.
* `changePrecision` codegen method has three branches now: the existing
  `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch
  (inline), and the ANSI throw branch which now emits a one-line
  `CastUtils.changePrecisionExact(...)` call instead of the 5-line
  `if/else` overflow block.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). The ANSI throw branch of
`Cast.changePrecision` is hit by every cast to decimal that may overflow
(very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread).
Collapsing the 5-line inline body to one line shrinks the generated
Java source for those plans.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source
text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite \
  *ExpressionClassIdentitySuite"
```

337/337 pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x
### What changes were proposed in this pull request?

Use `CastUtils.changePrecisionExact` / `changePrecisionOrNull` (added in
SPARK-56911) from the `DecimalType.Fixed` branches of:
* `BinaryArithmetic.doGenCode` (covers `Add` / `Subtract` / `Multiply`
  on `Decimal`).
* `BinaryDivModLike.doGenCode` (covers `Divide` / `IntegralDivide` /
  `Remainder` / `Pmod` on `Decimal`).
* `BinaryArithmetic.checkDecimalOverflow` (eval path used by both
  groups via `numeric.plus`/`minus`/`times`/`div`).

Each call site goes from
`eval1.$op(eval2).toPrecision(p, s, ROUND_HALF_UP, !failOnError, ctx)`
+ a 4-line null check to a single
`CastUtils.changePrecision{Exact,OrNull}` call.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). Decimal arithmetic is widespread in
TPC-DS plans, and the `BinaryArithmetic` Decimal branch was one of the
longer ANSI codegen bodies still emitted inline.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source
text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *ArithmeticExpressionSuite *DecimalSuite"
```

60/60 pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x
@gengliangwang gengliangwang force-pushed the SPARK-56914-decimal-arithmetic branch from ee5c3fb to 2a324d8 Compare May 18, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant